RDataTracker and DDG Explorer - Capture, Visualization and Querying of Provenance from R Scripts
نویسندگان
چکیده
Scientific data provenance is gaining interest among both scientists and computer scientists. The current state of the art of provenance capture requires scientists to adopt new technologies, most commonly workflow systems such as Kepler [BML06], Vistrails [SKS08] or Taverna [MBZ08], among others. While there are likely additional benefits to adopting these systems, they present a hurdle to scientists who are more interested in focusing on science than in learning new technologies. The work described in this poster is aimed at exploring the extent to which we can support scientists while expecting a minimal investment in terms of additional effort on their part. This work has been developed in collaboration with ecologists at Harvard Forest, a 3500 acre facility operated by Harvard University and serving as a Long-Term Ecological Research (LTER) site funded by the National Science Foundation. Many of these ecologists perform data analysis using R, a widely used scripting language that includes extensive statistical analysis and plotting functionality. These scientists are committed to understanding their data, making sure that their data analyses are done in an appropriate manner, and sharing their data and results with others. For these reasons, they appreciate the value that collecting data provenance may have, but they are not enthusiastic about learning new tools. In this poster, we present two tools aimed at this audience: RDataTracker and DDG Explorer. RDataTracker [LB14] is used to collect data provenance during the execution of an R script. DDG Explorer is the tool that is used to examine and query the resulting data provenance.
منابع مشابه
Collecting Provenance in an Interactive Scripting Environment
Scientific data provenance is often cited as a valuable tool for scientists to use to document their data collection and analysis processes, allowing improved understanding and sharing of data and results. However, most software that supports data provenance requires scientists to adopt new technologies rather than adding these capabilities to technologies that scientists already use. In this p...
متن کاملCollecting and Analyzing Provenance on Interactive Notebooks: When IPython Meets noWorkflow
Interactive notebooks help users explore code, run simulations, visualize results, and share them with other people. While these notebooks have been widely adopted in teaching as well as by scientists and data scientists that perform exploratory analyses, their provenance support is limited to the visualization of some intermediate results and code sharing. Once a user arrives at a result, it i...
متن کاملLinking Prospective and Retrospective Provenance in Scripts
Scripting languages like Python, R, and MATLAB have seen significant use across a variety of scientific domains. To assist scientists in the analysis of script executions, a number of mechanisms, e.g., noWorkflow, have been recently proposed to capture the provenance of script executions. The provenance information recorded can be used, e.g., to trace the lineage of a particular result by ident...
متن کاملnoWorkflow: Capturing and Analyzing Provenance of Scripts
We propose noWorkflow, a tool that transparently captures provenance of scripts and enables reproducibility. Unlike existing approaches, noWorkflow is non-intrusive and does not require users to change the way they work – users need not wrap their experiments in scientific workflow systems, install version control systems, or instrument their scripts. The tool leverages Software Engineering tec...
متن کاملTracking and Analyzing the Evolution of Provenance from Scripts
Script languages are powerful tools for scientists. Scientists use them to process data, invoke programs, and link program outputs/inputs. During the life cycle of scientific experiments, scientists compose scripts, execute them, and perform analysis on the results. Depending on the results, they modify their script to get more data to confirm the original hypothesis or to test a new hypothesis...
متن کامل